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ABSTRACT 

The ongoing High Accuracy Radial velocity Planet Search (HARPS) has found that 30-50% of 
GK dwarfs in the solar neighborhood host planets with M p i < M^ ep in orbits of P < 50 days. 
At first glance, this overall occurrence rate seems inconsistent with the planet frequency measured 
during Q0-Q2 of the Kepler Mission, whose 1,235 detected planetary candidates imply that ~15% 
of main sequence dwarfs harbor short-period planets with R p i < 4 R®. A rigorous comparison 
between the two surveys is difficult, however, as they observe different stellar populations and measure 
different planetary properties. Here we report the results of a Monte Carlo study that can account 
for this discrepancy via plausible distributions of planetary compositions. We find that a population 
concurrently consisting of (1) dense silicate-iron planets and (2) low-density gas-dominated worlds 
provides a natural fit to the current data. In this scenario, the fraction of dense planets decreases 
with increasing mass, from f roc k y — 90% at M — 1 M® to f roc k y = 10% at M — Mn £P . Our best 
fit population has a total occurrence rate of 40% for 2 < P < 50 days and 1 < M < 17 M®, and 
is characterized by simple power-law indices of the form N(M)dM oc M a dM and N(P)dP oc P^dP 
with a = —1.0 and (3 = 0.0. Our model population therefore contains four free parameters and 
is readily testable with future observations. Furthermore, our model's insistence that at least two 
distinct types of planets must exist in the survey data indicates that multiple formation mechanisms 
are at work to produce the population of planets commonly referred to as "super-Earths" . 
Subject headings: planets and satellites: general — methods: statistical — methods: numerical 
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1. INTRODUCTION 

The past decade has seen an extraordinary increase 
in our understanding of short-period extrasolar plan- 
ets, due in large part to the plethora of transit and ra- 
dial velocity (RV) detections made by a host of planet 

search survey s. No w with over 500 pla nets known 

(| Wright et all 120101 : iSchneider etaLl 1201 11 1, the state 
of the art has progressed to the point where statis- 
tical studies of entire planet populations are realisti- 
cally feasible (iCumming et al.ll20"0cl IHoward et al.ll2010l: 
Ford et "all 120111: lLatham et al.l 1201 It: iMoorhead et all 
20111: IHoward et all 120111: iWittenmver et all 120111 : 
Youdinll2011l : iTremaine fc Dongll2011l ). However, a sub- 
stantial challenge still lies in synthesizing the results from 
different surveys into a cohesive picture of the Galactic 
planetary population, as each technique provides differ- 
ent information about the planets' physical characteris- 
tics and is subject to different selection biases. 

These cross-survey considerations are especially impor- 
tant when one tries to compare the results of Doppler 
velocity surveys with the results of photometric tran- 
sit surveys. Not only do these two detection methods 
generally sample different regions of the Galaxy, but 
they also implement different observing strategies due 
to the intrinsically low geometric probability of a plane- 
tary transit and to the strict spectroscopic requirements 
nee ded to achieve 1 m/s precision in RV (see for exam- 
ple iBorucki et al.l 120101: iKoch et al]|2010t iBatalha et all 
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[20Tot iRupprecht et~all[200l . The result of these fun- 
damental differences is that most RV-dctected planets 
don't transit, and that most transiting planets suffer 
from a dearth of high-precision Doppler follow-up mea- 
surements. All is not lost, however: if these biases are 
properly accounted for, then one can utilize the statis- 
tical properties of the two samples to draw conclusions 
about the Galactic distribution of planetary properties. 

A particularly valuable outcome of the transit-RV com- 
parison originates from the distinction between mea- 
suring a planet's radius via a transit and measuring 
its mass via RV observations. Because of this differ- 
ence, when a Doppler-characterized planet is observed to 
transit, its range of possible compositions can be mod- 
eled even in the absence of any other observational con- 
straints, through individual mass-to-radius (M-R) rela- 
tionships calcul ated for a variety o f interior planetary 
structures (e. g. iFortnev et al.l 120071: iSeager et all 120071: 
IRogers et all 1201 ID . Unfortunately, however, planets 
that are well-characterized by both methods are rare, 
necessitating the use of statistical techniques to make 
some headway in understanding the compositional dis- 
tribution of planet populations if we assume that transit 
and RV surveys adequately sample the full range and fre- 
quency of planetary compositions once the populations 
are corrected for selection bias. 

Before these population-wide M-R relationships can be 
properly interpreted, it is essential to understand how 
they are fundamentally different from the M-Rs that are 
calculated using structural models of individual planets. 
On the most basic level, the transformation of a popu- 
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lation of planetary masses to radii requires that a range 
of compositions be included a priori. While it is cer- 
tainly true that inferring an individual planet's composi- 
tion from its mass and radius is a degenerate problem and 
results in a range of possible part-iron, part-silicate, part- 
gas compositions, the bulk density of a planet, p(M, R), 
is nonetheless a deterministic quantity. This connec- 
tion is absent, however, when one compares a transiting 
planet population to a Doppler-detected population and 
the two samples have very few planets in common. As 
a result, bulk density is essentially a free parameter in 
transit-RV comparisons, and some assumptions about it, 
or about the compositions which correspond to it, must 
be made. 

A key issue for the transit-RV comparison is how one 
chooses to parameterize planetary composition over the 
entire population. The simplest case would be if all plan- 
ets had the same composition, as this enables the planets' 
mass es to be straightforw ardly converted to radii. How- 
ever, iHoward et al.l (120111 ) have already shown that this 
very simple M-R fails to match the Kepler planet candi- 
dates when a power law is used for the planetary mass 
distribution, and so we consider more flexible and more 
physically motivated M-Rs in this paper. In particular, 
we find that the choice of these compositional parameters 
is crucial for reconciling any apparent statistical dispar- 
ities between RV and transiting planet populations. 

The plethora of ongoing planet searches enables the 
Galactic planetary census to be illuminated in a num- 
ber of different ways. Two of the most influential sur- 
veys to date are the Kepler Mission, which found 1,235 
transi ting planet candida tes in its first four months of 
data (|Borucki et al.l [201 ll) . and the Geneva High Accu- 
racy Radial velocity Planet Search (HARPS), which has 
discovered over 85 planets around hundr eds of the bright- 
est s tars in the solar neighborhood ()Scgransa n et al.l 
120 111) . Both of these surveys are in a position to 
unearth the population of low-mass short-period plan- 
ets and to provide statistics about their relative fre- 
quency, which hints suggestively at the prevalence 
of truly Earth-like planets and which is of particu- 
lar interest for planet formation theories that strive 
to explain or predict the mass-dista n ce distribution 
of pl a net populations (llda fc Linl l2004t iKornet fc Wolfl 
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Alarmingly, the low-mass planet occurrence rates mea- 
sured by the two surveys appear to conflict with one an- 
other. Systematic statistical analyses of the short-period 
Kepler planet candidat es have yielded 0.130 ±0.008 plan- 
ets per solar-type star (IHoward et al.l|2011[) or 0.19 plan- 
ets per solar-type star (|Youdinll2011[ h with the planets 
having 2 < R p i < 4 R ffi and P < 50 days. On the 
other h and, preliminary r esults from the HARPS planet 
search (|Lovis et al.l (20091: iMavor et al.ll2009l: lUdry||2010D 
indicate that 30 - 50% of Sun-like stars host sub-Neptune 
mass planets within 50-day orbits — a planet frequency 
that is substantially higher than the Kepler occurrence 
rate. Although these two occurrence rates do provide 
somewhat different information as discussed in Sj5j the 
following order-of-magnitude argument readily gives a 
sense for the apparent discrepancy in terms of the to- 
tal number of planets that Kepler would have detected 
in its first four months of data. 



Given a 40% occurrence rate and ~ 150,000 Kepler 
target stars, there are 60,000 potentially detectable plan- 
ets in Kepler's field of view, assuming that each host 
star harbors only one planet. Not all of these plan- 
ets will transit, however, as the required star-planet- 
observer alignment is fairly improbable given random in- 
clinations along the line of sight. For planets in orbits 
of 50 days or less, this geometrical transit probability 
works out to be 1 — 15%; taking a 5% transit probabil- 
ity (10-day orbit) as a benchmark, the number of sub- 
Neptune-mass planets that Kepler would have been able 
to detect is thus approximately 3,000. If we map the Ke- 
pler planet candidate radii to mass via the simpl e rela- 
tion M/M e = {R/R @ ) 2m (jLissauer et al.ll2011bl) . then 
we see that about 900 of Kepler's planet candidates fall 
in the M < Mjvep range. Thus, the HARPS occurrence 
rate appears to overestimate the number of planets that 
Kepler would have detected by a factor of 3. 

Ordcr-of-magnitude arguments can be misleading, 
however, so in this paper we take care to fully account for 
details of the RV-transit comparison that may affect this 
result, including factors such as the enhanced geometri- 
cal transit probability of elliptical orbits, the shallower 
transits of more inclined orbits due to stellar limb dark- 
ening, target star selection biases, and Kepler's detec- 
tion incompleteness. By conducting a detailed compari- 
son between the second quarter Kepler planet candidates 
and the HARPS's generalized statistic about the occur- 
rence rate of low-mass planets around solar- type stars, we 
provide a systematic statistical analysis of the composi- 
tions in a truly sub-Neptune-mass exoplanet population. 
In the process, we identify the first physically motivated 
mass-to-radius relationship for a population of low-mass, 
short-period planets that can reproduce occurrence rates 
observed by both RV and transit planet searches. 

The layout of this paper is as follows. In [J2]we briefly 
summarize the Kepler data set and discuss the use of 
planet candidates instead of confirmed planets for our 
analysis. In fj3]we describe our simulations and statistical 
calculations. In fj4] we present our results on the total 
number of planets that Kepler would have been able to 
detect given the HARPS occurrence rate, and in ij5] we 
discuss the implications of these results for our current 
understanding of exoplanet populations. 

2. THE KEPLER DATA SET 

The Kepler Mission is a 3.5-year search for poten- 
tially habitable Earth-siz ed planets around solar-type 
stars (|Borucki et al.ll2010[ ). To detect these planets, Ke- 
pler mo nitors ~ 150,000 stars in its > 100 deg 2 field 
of view ()Koch et al.ll2010T) for periodic photometric dips 
that fit the shape and duration of a planetary tran- 
sit. The telescope 's ~ 0.01" per hour pointing stability 
dKoch et alJ[20irl and 10-100 ppm photometric preci- 
sion ( Jenkins et al.ll20fo() enables the detection of planets 
that are Earth-sized or smaller. Furthermore, its Earth- 
trailing heliocentric orbit facilitates continuous data ac- 
quisition with out the diurnal or annual cycles that gen- 
erate aliases (|Koch et al.ll2010T ). These characteristics, 
along with the low false positive detection rate discussed 
below, enable robust statistical analysis of the detected 
planets' properties, including the small planetary radii 
and the short periods that are the focus of this study. 

On February 1, 2011 Kepler released its second quar- 
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ter (Q2) data, which was soon followed by the announce- 
ment of 1,235 transiting planet candidates ()Borucki et al.l 
i2on . it is necessary to note, however, that the vast ma- 
jority of these planets are unconfirmed and thus maintain 
"planet candidate" status. The current consensus is that 
these candidates can be catalogued as true planets only 
if they exhibit transit timing variations or are detected 
through the radial velocity method, as other astrophysi- 
cal events such as binary blends with background stars, 
eclipsing hierarchical triples with small separations, and 
certain t ypes of stellar variability can mimic planetar y 
transits (jGautier et "all 120101 : [Morton fc Johnson! 1201 If ). 
Unfortunately, however, the majority of Kepler's target 
stars have V > 11 and thus are faint for the purpose 
of Doppler follow-up, making these additional RV mea- 
surements expensive and leaving the vast majority of the 
Kepler candidates unconfirmed. 

To compensate for these observational limitations, the 
Kepler team has developed an extensive vetting process 
to eliminate as many of these false planetary transit sig- 
natur es as possible (jGautier et al.l l201~0t iBorucki et al.l 
120111) . Inevitably, however, a small but non- negligible 
fraction of false positives ar e expected to persis t in 
the list of planet candidates. IBorucki et al.l ()2011[ ) es- 
timates that this false positive fraction is as high as 
20%, while a detai l ed B ayesian analysis conducted by 
iMorton fc Johnson! ([201 If ) finds that the transit depth- 
independent false alarm probability is < 5% over the 
entire field of view, given stars with Kepler magnitude 
Kp < 16, a 30-50% planet occurrence prior, and the as- 
sumption that follow-up astrometry can identify binaries 
at any Kp with separations > 2". When this last as- 
sumption is relaxed, the false alarm probability increases 
with decreasing transit depth for the fainter target stars 
and can exceed 1 5%, as illustrated by the false positive 
probabilities that IMorton fc Johnson! (|2011D individually 
compute for each Kepler planet candidate. We thus 
keep in mind that the total number of confirmed Kepler 
planets is like l y ~ 5 — 15% lower than that reported by 
IBorucki et all (pOlTl) as we proceed with our statistical 
analysis of the Kepler planet candidate population. 

3. THE TRANSIT-RV COMPARISON: METHODS 

Comparing the HARPS occurrence rate with Kepler's 
planet candidates involves several steps. First, we re- 
quire that the aggregate properties of our initial planet 
population are consistent with the cumulative character- 
istics of the low-mass population detected by the HARPS 
survey f iJ3.ll) . To compare these planets to Kepler's pub- 
lic data set, we map our initial distribution of planet 
masses to radii via a population- wide mass-to-radius (M- 
R) relationship fi j3.2[) . Each simulated planet is subse- 
quently matched to a Kepler target star f ^3.3[) and its 
ligh t curve is computed ba sed on analytic transit formu- 
lae (|Mandel fc Agolll2002f). W e then apply Kepler's de- 
tection criteria ( Jenkins et al.ll2010h to assess whether or 
not that planet wou ld ha ve been detected by the end of 
the second quarter ( H3.4[) . Finally, we use both the two- 
dimensional Kolmogorov-Smirnov test over period and 
radius and the one-dimensional \ 2 test over radius to as- 
sess the quality of the fit between our simulated planet 
population and Kepler's planet candidates, and to iden- 
tify the values of the free parameters which best fit Ke- 
pler's Q2 data (EB ©. 



3.1. Simulations of the Radial Velocity Population 

Other than stating that 30 - 50% of Sun-like stars 
host sub-Neptune-mass planets with P < 50 days, the 
HARPS overall occurrence rate does not address spe- 
cific details of the planetary mass-period distribution. 
Accordingly, we must select a general, easily parame- 
terized distribution that is able to recover the HARPS 
overall occurrence rate. Power l aws meet these crite- 
ria, so we follow common practice (ICumming et aLll2008l : 
Howard et all 120101 |20T1 [Ygudjn||2011|) and adopt: 



N(M)dM = N tot C M M a dM, 



(1) 



where N(M)dM is the number of planets that have a 
mass between M and M + dM, N to t is the total number 
of planets in the sample, Cm is a normalization constant, 
and a is the mass power law index. Similarly, we take 
for the period distribution: 



N{P)dP = N tot CpP p dP. 



(2) 



We use the HARPS overall occurrence rate to deter- 
mine N to t, Cm, and Cp for our simulated populations. 
Ntot is simply the planet occurrence rate times the to- 
tal number of stars that Kepler is observing, assuming 
that each star which harbors a planet harbors no more 
than that one planet — the bare minimum suggested by 
the prediction. Given that Kepler observed over 110,000 
G and K dwarfs during its second quarter (Q2) of data 
(jKepler Mission Teamlfeoilf) , this leads to N tot ~ 55000 
for a 50% occurrence rate. Cm and Cp are determined 
by setting minimum and maximum values for mass and 
period in our simulations. The maximum period of 50 
days is explicitly given by the stated HARPS occurrence 
rate, as are the limits on planet mass if we define a sub- 
Neptune planet to have 1 < M < 17 M® ~ M,Ne P - It 
is important to emphasize that we are only considering 
low planetary masses here; Jupiter-mass planets are not 
considered in our simulations. 

The minimum value on period, while not expressly in- 
dicated in the HARPS low-mass occurrence rate, can be 
reasonably chosen from existin g trends. Both the cen- 
sus of Kepler planet candidates (|Borucki et al.|[20TTI ) and 
the population o f planets discovered through the radial 
velocity method (IWright et al.ll2010l ) suggest that there 
is a dearth of planets with P < 2 days that is not due 
to the selection biase s of the different detection methods. 
IHoward et "all (pOlTI ) fit a power-law distribution with an 
exponential cutoff at short periods to the Kepler planet 
candidates and found that the transitional period varies 
from 2 to 7 days for planets with 2 < R < 6 R®. To 
simplify our input distributions, we ignore the exponen- 
tial cutoff and set P m in = 2 days, keeping in mind that 
any deviation from a power law for 2 < P < 7 days may 
impact our ability to fit Kepler's observed distribution. 

A rigorous interpretation of the HARPS statistic 
would include the unknown sin(i) factor on the ob- 
served masses. We note, however, that the distribution 
of inclinations for the observed radial velocity planets 
is poorly understood and that spherical isotropy can- 
not be assumed due to the detection biases inherent in 
the radial velocity technique. Although some insight 
may be glean e d from statistical analysis such as that in 
I Ho fc Turnerl (|2010f ) or from the few planets which ex - 
hibit the Rossiter-McLaughlin Effect (|Schlaufmanll2010j) , 
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in this analysis we take our mass limits as the bounds on 
the true mass of our simulated planets, effectively ignor- 
ing any refinements stemming from the sin(i) effect. 

3.1.1. Simulation Parameters 

To account for the ambiguity in the RV mass and pe- 
riod distributions, we require that the power-law indices 
a and /3 serve as free parameters in our simulations: we 
allow a to vary from -2.5 to and /3 to vary from -0.5 to 
0.5, both in increments of 0.1. We model eccentricity, e, 
longitude of periastron, ui, and inclination, i, as uniform 
distributions, randomly drawing e from < e < 0.2, w 
from < lj < 27r, and i from an isotropic sphere. Taken 
with P, these orbital elements serve to determine which 
planets transit, given their geometrical transit probabil- 
ity. We choose to include non-zero eccentricities because 
elliptical orbits can enhance the probability of a transit, 
but we set the upper bound at e = 0.2 with the expec- 
tation that many short-period planets will have experi- 
enced a significant degree of tidal circularization. This 
bound is broadly consistent with the observed eccentric- 
ity distribution of confirmed planets in our mass and pe- 
riod range, which shows that a vast majority (~ 80%) 
of low-mass planet s with P < 50 days have e < 0.2 
(IWright et alJl2010h . 

Two more free parameters are introduced for the sec- 
ond M-R we consider in this paper f £)3.2.2[) . as we allow 
the fraction of rocky planets in the population to vary 
as a linear function of mass. These fractions are then 
used to randomly assign each planet either a gaseous or 
a rocky composition. In addition, we randomly allocate 
each planet to a Kepler target star, as discussed in £13.31 

3.2. Population-wide Mass-to- Radius Relationships 

A crucial consideration for the transit-RV comparison 
is the population-wide M-R used to map an RV planet's 
mass to a transiting planet's radius. iHoward et al.l 
(| 2 1 If) have shown that applying one bulk density to an 
entire planet population fails to match the Kepler can- 
didates, so we begin our investigation with more flexi- 
ble and physically motivated M-Rs, while taking care to 
minimize the number of degrees of freedom. In partic- 
ular, we consider two population-wide M-Rs in this pa- 
per: a power-law fit to measured planetary masses and 
radii, and a multi-valued parameterization that relaxes 
the single-valued assumption involved in fitting a power 
law to data. 

3.2.1. Single-Valued M-R 

ILissauer et al.l (|2011b| ) use the following power-law fit 
to Earth and Saturn as the mass-radius relation for Ke- 
pler's planet candidates: 



M 

Mm 



R_ 



(3) 



which tacitly assumes that extrasolar planets resemble 
those in our Solar System. Experience has shown that 
such an approach requires caution, so as a check we 
derive a comparable M-R for the nine t ransiting ex- 
traso lar planets with 1 < M < 17 Mm (IWright et al 
20101) : CoRoT-7 b dQueloz et~aT1 120091: ILeger et al 



b dBatalha et al.ll20lil ) Kepler- 11 b - f dlissauer et al" 



201 lal) . and 55 Cnc e (jwinrt et al.l [20lTt iDemorv et al 



Including the error on radius, we find the fol- 



lowing best fit for the radii of these planets given their 
masses: 



R 



0.87 



-0.08 



M 

Me 



0.45±0.06 



(4) 



which i s encouragingly close t o the inverse of the M-R 
used by ILissauer et all (|2011bD . However, the mass mea- 
surement errors are much larger than those on radius, so 
when we compute the fit in the other direction we find 



M 
Me 



= 5.8 



-1.2 



R 

Rm 



0.30±0.22 



(5) 



which has substantial error bars and does not invert 
to give Equation SJ Thus, the M-R computed directly 
from the dually-detected, low-mass extrasolar planets 
is poorly constrained, and we proceed cautiously with 
R/R® = (M/M e ) 0A8 from Equation H 

3.2.2. Multi-Valued M-R 



2009), GJ 1214 b (|Charbonneau et al J 120090 . Kepler-10 



While the ILissauer et al.l (|2011bl ) M-R implicitly incor- 
porates compositional variation from planet to planet, it 
does not allow for the possibility of a multi-valued M- 
R. This is potentially a severe shortcoming, as a more 
complex M-R has appeared to be necessary from the 
outset of observational constraints on low-mass planet 
compositions: the first t wo planets with measured radii 
and masses, CoRoT-7 b (IQueloz et al.|[2009t ILeger et al.l 
120091 ) and GJ 1214 b (|Charbonneau et al.ll2009D . yielded 
very different densities (6 g cm -3 and 2 g cm -3 respec- 
tively), despite having similar masses (4.9 M® and 6.5 

With this observational evidence in mind, we believe 
that the key to reconciling the Kepler and HARPS oc- 
currence rates may be a multi- valued low-mass M-R. Our 
parameterization assumes that the simulated planets can 
have either a significant gaseous composition (Neptune 
analogs; an extension of the gas giants to lower masses) 
or a rocky composition (Earth analogs; an extension of 
the terrestrial planets to higher masses), and that the 
admixture of these two compositions varies as a linear 
function of mass for 1 < M < 17 M®. This admixture 
is quantified by the fraction of rocky planets in the pop- 
ulation, frocky(M): if, for example, f roc k v {l) = 1.0 and 
/rocfej/(17) = 0.0, then f r0 cky(8) = 0.5, meaning that all 
1 Mm planets would be rocky, all 17 Mm planets would 
be gaseous, and the 8 M® planets would be evenly di- 
vided between the two compositions. In our simulations 
we allow j rocky (1) and frockyi 17) to vary between and 
1 in increments of 0.1, giving two more free parameters 
in our simulations. 

For the Earth analogs in this multi-valued M-R we 
use the Solar System's terrestrial planet population- 
wide mass-to-radius relationship: R/R& = (M/M®) ' 33 . 
We emphasize that this rocky M-R is not just a re- 
expression of the individual mass-to-radius relationship 
for a constant-density sphere; instead, this population- 
wide M-R was derived by fitting a power law to all of the 
Solar System's inner planets, much like the R cx M 0AS 
relationship was derived above. 
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For th e Neptune analogs w e use the M-R curves calcu- 
lated by iRogers et al.l (|201lD . These authors model the 
structure of low-mass planets with substantial gaseous 
envelopes by invoking a core accretion formation his- 
tory and then self-consistently incorporating the effect of 
planetary equilibrium temperature, T eq , across the range 
of orbital periods and stellar fluxes that we consider here. 
They find, however, that the M-R curves of constant 
gaseous envelope mass fraction, M env , are remarkably in- 
sensitive to planet mass above ~ 7 M ffi . Because no single 
M env provides the dynamic range needed to explain the 
diversity of radii that Kepler observes, we must allow for 
variation in envelope fraction to construct a population- 
wide M-R that can reasonably reproduce the observed 
radius range. Noting that the M-R curves are roughly 
equally spaced in R by approximately logarithmic bins 
in M env , we randomly choose an envelope mass fraction 
from a log-uniform distri bution between 10~ 5 and 10 _1 . 
Finally, using Figure 4 of IRogers et al.l ([20 111) , we inter- 
polate our simulated planets' radii as a function of M, 
M P „. „ and T en . 

As lRogers et al.l (|2011D illustrates, varying M env allows 
planets with masses as small as 2 M ffi to have a radius as 
large as 7 R®, which enables planets less massive than 
Neptune to fall within the 2 < R < 6 Rm range that Ke- 
pler has found to b e well populated ([Borucki et al.|[20TTI : 
iHoward et al.|[20lTI ). However, these relatively low-mass, 
large-radius planets are particularly susceptible to atmo- 
spheric mass loss, and so these planets may not actually 
be able to hold onto their gaseous envelopes, depending 
on the amount of irradiation the y receive from their host 
star. Following the discussion in (Rogers et all (f20TT[) . 
incorporate the possibility of mass loss in our population- 
wide M-R via the f ollowing timescale ar gument. 

As illustrated bv lLammer et all (|2003l ). one must con- 
sider the effect that X-ray and extreme ultraviolet (XUV) 
irradiation has on a planet's thermal structure in order 
to realistically treat atmospheric mass loss. In the regime 
where the amount of energy incident on the planet de- 
termines the degree of at mospheric escape, this mass 
loss is parameterized bv (iLecavelier Pes Etangsl 120071 : 
iValencia et aTll2"oIol IRogers et al.ll201ll )~ 

>y = £TrFxuvRx UV Rp , , 

GMpKude ' 1 ' 

where Fxuv is the XUV flux incident on the planet from 
the host star; e is the fraction of incident XUV energy 
that is actually absorbed by the atmospheric particles; 
Rxuv is the planet radius at which the XUV flux is ab- 
sorbed; R p is the radius of the planet as calculated from 
planetary interior structure models; M p is the mass of 
the planet; and K t ide is a tidal correction factor of or- 
der unity for planets with R < i?Ar ep and P > 2 days. 
Unfortunately, e is largely unknown, so at best Equa- 
tion [5] pro vides an orde r -of-ma gnitude estimate for M. 
We follow IRogers et al.l (|2011l) in setting e = 0.1 and 
Fxuv = Fxuv,® = 4.6 x 10" 3 W m" 2 (jRibas et all 
I2005T ) : we scale Fxuv by the equilibrium temperature 
of the planet, which depends on the radius of the host 
star, the effective temperature of the host star, and the 
semimajor axis of the planet's orbit . From the mass loss 
timescales plotted by IRogers et al.1 ([20111 ) we estimate 
that R X uv ~ 10-Rp for these short-period low-mass plan- 



ets. With M thus determined, the atmospheric mass loss 
timescale is 

, ^envelope /_,\ 

loss= M ' (7) 

If ti oss < 1 Gyr, we consider the planet to have com- 
pletely lost its gaseous envelope, and we take the radius 
of th e planet to be the r adius of its 50% rock, 50% ice 
core ([Fortnev et al.H2007t) . 

3.3. Star Selection 

Once we apply an M-R to the simulated RV popula- 
tion, we randomly allocate planets to specific Kepler tar- 
get stars. This one-to-one matching allows us to sidestep 
the concern that the selection biases exhibited by differ- 
ent detection methods will si gnificantly influence com- 
puted planet occurrence rates ([Howard et al.H201ll) . and 
we can directly compare our simulated population with 
Kepler's planet candidates. Accordingly, we adopt the 
list of 165,000 long-cadence Q2 Kepler target stars to 
initiate our star selection. We begin by extracting the 
photometrically-derived effective temperature, T e ff, the 
surface gravity, log(<?), the radius, R s tan and the Kepler- 
bandpass apparent magnitude, Kp, from the each star's 
Q2 FITS header. T hese data originate from the Kepler 
Input Catalog (KIC; iKepler Mission Teamll2009D . which 
has kn own errors of ±200 K on T e ff and ±0.4 dex on 
log(g) (|Brown et al.1 1201 H ). Because these two parame- 
ters are used to calculate R s tar, the errors on the planet 
candidates' radii can be significant; in |JS]we discuss the 
possible effect of these errors on our results. 

In their analysis of Kepler's planet candidates, 
IHoward et al.l ([201 1|) compute Kepler's observed occur- 
rence rates from a heavily vetted list of target stars, 
whose total noise in one quarter of data enables detec- 
tion of a R > 2 R ffi planet with SNR > 10. This ap- 
proach prompts them to drop all stars with Kp > 15 
and all planets with R < 2 R ffi due to concerns about 
sample incompleteness. By contrast, our approach re- 
tains the entire Kepler target star sample, with only the 
log (g) cut discussed below: because we individually sim- 
ulate each planet's light curve to accurately determine 
its detectability f ^3.4p and then ask how many planets 
Kepler would have seen in its first four months of data if 
the HARPS occurrence rate is true (Sj4j, we naturally ac- 
count for the incompleteness in Kepler's first four months 
of data. (This incompleteness is displayed graphically 
in Figure [T] as the smallest-radius planet that each star 
could have detected by the end of Q2.) Thus, our simu- 
lation procedure permits us to include dimmer stars and 
smaller planets with radii down to 1 R®, which allows 
us to draw conclusions with a larger sample size and im- 
proved statistics. 

The only severe cut we make to the 165,000 available 
Q2 target stars is in log(g). We restrict potential planet- 
hosting stars to those with log(g) > 4.0 to minimize con- 
tamination from subgiants, as the KIC's sur face gravities 
are p oorly contrained above T e f / ~ 5400 K ([Brown et al.l 
1201 ID . The resulting list consists of 131,000 stars (Figure 
[T]), the vast majority (> 110, 000) of which are G and K 
dwarfs. Nonetheless, a small proportion of subgiants and 
giants, whose radii may be u nderestimated in t he KIC 
by as much as a factor of 2 ([Brown et al.l I2011T) . likely 
remains in our target star sample. Without knowledge 
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Figure 1. Right: apparent magnitude, Kp, and effective temperature, T e ff, from the Kepler Input Catalog (KIC) for the Kepler target 
stars included in our simulations f i|3,3jl . All of these stars have KIC log(g) > 4.0. Left: number of log(g) > 4.0 Kepler target stars in each 
apparent magnitude bin. The color represents the smallest planet around each target star that Kepler could have detected in its first four 
months of data, assuming an orbit with P = 20 days and e = 0. With the same orbital parameters for each size planet, this minimum 
detectable rad ius is thus determined by the radius of the star, Rstar, and by the star's total photometric noise on a three- hour timescale, 
CDPP3 ( §3Al . In the scattcrplot, note the general trend of minimum detectable radius with both Kp and T e ff, which correlate with 
ODPP3 and R a tar, respectively. The histogram to the right more clearly illustrates the trend of increasing minimum detectable radius 
with increasing Kp (due to increasing CDPP3). However, it is important to note that there do exist dim target stars around which Kepler 
could have already detected a 1 to 1.5 -R® planet. This is a result of the trend of decreasing minimum radius with decreasing Rstar and 
the fact that low-mass stars exist in every Kp bin. 



about the degree of subgiant contamination, we cannot 
accurately account for their statistical effect in our re- 
sults, although we expect that this effect will be very 
small based on the low numbers of possib ly misclassified 
evolved stars found by iBasri et al.l ()2011l ) . 

3.4. Detectability of Simulated Planets 

To pinpoint the simulated planets that Kepler would 
have identified as planet candidates after four months of 
data collection, we fir st compute analytic light curves 
(jMandel fe Agoll 120021 ) for the simulated planets that 
transit according to t heir geometric transit probability 
(|Seagroves et alj 12003V These light curves incorporate 
the planets' eccentricity and inclination as well as the 
if epfer-ban dpass limb darkening coeffi cients that are cal- 
culated by iClaret fe Bloemenl ([201 ID for a large range 
of stellar effective temperatures, surface gravities, and 
metallicities. Using a 30-minute cadence over 132 days to 
match Kepler's long-cadence Q0 - Q2 datasets, we deter- 



mine the transit depth, duration, and the total number 
of transit events dir ectly from the si mulat ed light curves. 

As described in iBatalha et al.l (|2010), Kepler's de- 
tectability criterion is set such that < 1 false posi- 
tive planet detection over its 3.5 year mission would re- 
sult from purely statistical fluctuations in stellar photon 
counts. This requirement gives a 7.1a threshold for a 
transit's statistical significance when the light curve is 
folded and binned. The detectability of a planet there- 
fore depends on both R p i/R s tar and a number of stel- 
lar parameters and instrumental properties which affec t 
the total noise (jBatalha et al.ll2010t [Jenkins et al.ll2010D . 
These systematic errors are difficult to assess without in- 
timate knowledge of Kepler's performance, so we use the 
noise calculated directly by the Kepler data reduction 
pipeline, the Combined Differential Photometric Preci- 
sion (CDPP, obtained from J. Christiansen & J. Jenkins 
via personal communication, June 7, 2011), to reproduce 
as accurately as possible the planet population that Ke- 
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pier could have identified by the end of Q2. 

Defined as the root mean square of stellar photomet- 
ric noise on transit timescales, the CDPP provides the 
most accurate estimate of the noise from each target star 
that would interfere with a transiting planet's detectabil- 
ity. A wavelet-based, adaptive matched filter is applied 
to the corrected Kepler light curves in the Transiting 
Planet Search sectio n of the Science Processing Pipeline 
(|Jenkins et alj|2010() to produce 3-hour, 6-hour, and 12- 
hour CDPP estimates, which are then used to calculate 
the statistical significance of a possible transit event. In- 
corporating Kepler's own noise metric in our simulations 
automatically folds in its detection biases and accounts 
for sample incompleteness below 2 R^; therefore, we can 
extend our analysis down to Earth-sized planets without 
reservations about hidden selection effects. 

Our simulations only consider planets with 2 < P < 50 
days, so the 3-hour CDPP estimate is the most rele- 
vant for our purposes. Matching each planet to a Ke- 
pler target star also matches it to a CDPP value, so 
we scale this noise estimate by the transit duration and 
the t otal number of tran s it events observed d uring Q0 
- Q2 (|Batalha et alJl201Ct [Howard et al.ll2011ft . Our de- 
tectability criterion therefore becomes: 



SNR = 



SJNt 



N d 



CDPP-, 



> 7.1, 



(8) 



where S oc (R p i / R s tar) 2 is the maximum transit depth (in 
ppm) identified from the analytic light curves, CDPP3 is 
the Q2 3-hour Combined Differential Photometric Pre- 
cision (in ppm) associated with the planet's host star, 
N tr is the number of observed transits in four months, 
and Ndur is the number of data points acquired per tran- 
sit on a 30-minute cadence. We note that 6 is propor- 
tional but not equal to (R p i/R s tar) 2 because we include 
a range of possible transit-producing inclinations and 
self-consistently incorporate the effect of limb darkening 
based on the host star's T e ff and log(g). 

Figure Q] illustrates our detectability criterion graph- 
ically, with the color scale showing the smallest planet 
for each log(g) > 4.0 target star that Kepler could have 
detected after four months of data collection, assum- 
ing an orbit with P = 20 days and e = 0. As ex- 
pected, this minimum detectable R p i trends with both 
Kp and T e ff, which correlate with CDPP and R s tar, 
respectively. When the orbit is not held constant, an in- 
dividual planet's detectability is also determined by its 
orbital period, as given by N tr in Equation [8] 

3.5. Statistics of the Detectable Period- Radius 
Distribution 

The above procedure gives us the period-radius distri- 
bution that Kepler would observe when the underlying 
planet population conforms to the HARPS occurrence 
rate. Our lack of detailed knowledge about the HARPS 
data set, however, has introduced some freedom in the 
population's initial mass and period distributions. Dif- 
ferent input distributions can significantly affect the total 
number of planets that are detectable by Kepler ( H4.ll 
Figure [5]), so we must identify the free parameters which 
produce detectable planet distributions that best fit Ke- 
pler's planet candidates before we can conclude that any 



discrepancy between the total number of observed plan- 
ets is a result of the overall 30-50% statistic. A general 
sense of the ap propriate a's and /3's can be glean e d from 
the analyses of IHoward et al.l (|2011h and lYoudinl (|2011f ) , 
but because we begin with a Doppler survey rather than 
a transit survey and because we consider multiple- valued 
mass-to-radius relationships, an independent assessment 
of the best-fit mass and period distributions is valuable. 

Before we can make this comparion, however, we need 
to filter the list of 1,235 Kepler planet candidates to 
match the limits we impose on the simulated popula- 
tion. Accordingly, we retain only those candidates with 
1 < R < 4 R e ~ R Nep and 2 < P < 50 days orbiting 
stars with Kp < 16. We also impose a cut on the candi- 
dates in multiple-planet systems, including only the first 
planet listed by the Kepler Science Processing Pipeline 
in this mass and period range; in most cases, this is the 
planet labeled ".01". This cut conforms to our assump- 
tion of one planet per host star and reduces the total 
number of Kepler planets in our radius and period range 
from 797 to 631, a difference of 166 (~ 20%). Depending 
on the multiple-planet prescription that could be applied 
to the HARPS occurrence rate, our single-planet assump- 
tion may either overestimate or underestimate the differ- 
ence in the total number of planets between the simula- 
tion and the full Kepler low-mass data set. 

To address the goodness-of-fit between the Kepler- 
detectable, simulated period-radius distribution and the 
Kepler candidates' period-radius distribution, we employ 



I 




0.0 



0.2 



0.4 



Figure 2. The total number of detectable ( ^3. 41) simulated planets 
(Ndetect) produced by the R = M 4s mass-to-radius relationship 
( i|3.2,U for a 40% overall occurrence rate in the 2 < P < 50 days 
and 1 < R < 4 Rgj range; this total number is averaged over 100 
realizations. As a point of reference, the total number of Kepler 
planet candidates that we use for this comparison is 631. The axes 
denote the period power law index (/3) and mass power law index 
(a), which serve as free parameters in our simulations f £|3. It . The 
white box outlines the values for a and which produce a median 
probability V 3 i ng > 10~ 4 over all N=100 realizations. 
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the two-sample two-dimensional Kolmogor ov-Smirnov 
(2-D K-S) test (|Fasano fc Francesch"inlll987l) as well as 
the sample size- independent one-dimensional x 2 test over 
planet radius. The 2-D K-S test avoids binning data, 
unlike the \ 2 test, and thus maximally preserves infor- 
mation contained in the radius-period distribution of the 
detectable simulated planets. However, the K-S test is 
most sensitive to the middle of its data range, and so 
we use the 1-D \ 2 test to distinguish between a 2-D K-S 
statistic produced by a good fit in the 2 R® < R < 3 
R® range and a similar 2-D K-S statistic produced by a 
good fit over the entire 1 R® < R < 4 R® range. We use 
the sample size-independent version of the x 2 statistic to 
fairly assess the goodness-of-fit between different power- 
law indeces, as this fit would otherwise be dominated by 
differences in the total number of detectable planets. 

In practice, using the x 2 test for the planet radius dis- 
tribution does not discard much information, as the Ke- 
pler planet candidates' radii, rounded to the nearest 0.1 
R®, are already effectively binned. To make sure that 
this artificial structure in the Kepler period-radius dis- 
tribution does not determine the quality of the best fit, 
we also round the simulated planets' radii to the nearest 
0.1 R® before computing the 2-D K-S statistic. Thus, the 
2-D K-S test's added value is the simultaneous inclusion 
of the period distribution with the radius distribution for 
a comprehensive goodness-of-fit. 

4. THE TRANSIT-RV COMPARISON: RESULTS 
4.1. Single-Valued M-R 

The result of 100 realizations of the R/R® = 
(Af/M®) 048 mass-to-radius relationship fi )3.2.1[) com- 
puted at a 40% overall occurrence rate is illustrated in 
Figure [5J The color denotes the total number of de- 
tectable f £|3.4[) simulated planets (N detect) with 1 < R < 
3.9 R® = (17 M®) 48 and 2 < P < 50 days, averaged 
over all N=100 realizations; the total number of analo- 
gous Kepler planet candidates in our filtered list is 631. 
Figure [2] indicates that N detect depends sensitively on the 
mass power law index a and period power law index (3 
f q3.1[) . Thus, we need to identify the a and (3 which 
give the best fit between the Kepler-detectable popula- 
tion and Kepler's planet candidates in order to conclude 
that any discrepancy between the total number of ob- 
served planets is a result of the overall 30-50% statistic, 
not a result of using an inappropriate power law. 

Calculating both the 2-D K-S test and the 1-D x 2 test 
allows us to identify these best-fitting free parameters 
( %r5|) . The 2-D K-S test gives us the probability V that 
the two populations were drawn from the same underly- 
ing radius and period distribution; for the single-valued 



M-R considered here, the maximum V. 



sing 



10" 



In 



Figure [5] we have boxed in white the values for a and /? 
which produce a median V S mg > 10~ 4 over all N=100 
realizations: a = —1.8, (3 = 0.0, and a — —1.7, j3 = 0.0. 
These values correspond to Ndetect — 537 ± 28 and 
Ndetect = 574 ±28, respectively. Thus, a 40% occurrence 
rate for R/R® = (M/M®) 48 actually under-predicts 
the total number of planets that Kepler would see in its 
first four months of data. 

Before we can discuss the implications of this result, 
however, we must address the low probabilities produced 
by the 2-D K-S test. Figure [3l representing one realiza- 
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Figure 3. Period vs. radius for a single realization of the simu- 
lated planet population produced by the R = M 0,48 mass-to-radius 
relationship with a = —1.8, j3 = 0.0, and a 40% overall occurrence 
rate. The Kepler planet candidates are marked with the circles, 
and the detectable simulated planets are marked with the plus 
signs. Comparing the two data sets yields a 2-D K-S probability 
Vsing = 0.03%. Note the relative paucity of simulated planets in 
the 1.8 < R < 3 Re and P < 20 days range. 



tion of the data at a = —1.8 and /3 = 0.0, sheds some 
light on the cause of this issue: this single-valued M-R 
does not produce enough planets with 1.8 < R < 3 R® 
and P < 20 days. Our treatment of Kepler's sample in- 
completeness does not seem to be at fault, as there are 
very few detectable simulated planets at small radii and 
long periods where Kepler's detectability criterion (i )3.4p 
rejects the most planets. Furthermore, our use of Kepler 
target stars ( §3.3(1 precludes Kepler's selection biases as 
an explanation for this discrepancy. We therefore turn to 
a more flexible mass-to-radius relationship as a potential 
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Figure 4. Period vs. radius for a single realization of the sim- 
ulated planet population produced by the multi-valued mass-to- 
radius relationship with a = —1.0, j3 = 0.0, f r0 cky{l) = 0.9, 
frockyi^) = 0.1, and a 40% overall occurrence rate. The black 
circles denote the Kepler planet candidates; the red plus signs 
denote the detectable simulated planets with a rocky composi- 
tion; the green diamonds denote the detectable simulated planets 
with a gaseous composition; and the blue asterisks denote the de- 
tectable simulated planets with a half-rock, half-ice composition, 
which could be produced by significant mass loss from the gaseous 
planets. Comparing the two data sets yields a 2-D K-S probability 
V mult = 0.13%. 
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means to improve the transit-RV ht. 

4.2. Multi-Valued M-R 

Figure 2] displays the period-radius distribution for one 
realization of our multi- valued M-R f ij3.2.2l) . A qualita- 
tive comparison with Figure [3] indicates that the multi- 
valued M-R does give a better fit than the single-valued 
M-R, which is corroborated quantitatively by an order- 
of-magnitude improvement in the 2-D K-S probability 
CPmuit ~ 0.1%). Unfortunately, however, this proba- 
bility is still very low. To isolate the discrepancy, we 
compute x 2 f° r the radius distribution averaged over 100 
realizations of the simulated planet population; these dis- 
tributions are shown in Figure [5] and correspond to x 2 = 
1.2 and 1.8. The fact that the reduced x 2 values are close 
to unity is encouraging and indicates that the discrep- 
ancy lies in the simulated planets' period distribution, 
which the 2-D K-S test has the leverage to assess ft j3.5[) . 

A discrepancy in period is not unexpected consider- 
ing the simplifying assumptions we made to the input 
period distribution (£ !3.1j) . Given that V mu it > V S i n g, 
however, the detectable period distribution also depends 
on the population-wide M-R. It is probable that other 
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Figure 5. Number of planets vs. radius averaged over all N=100 
realizations of the simulated planet population (striped pattern); 
the error bars denote the standard deviation of the number of de- 
tectable planets in each bin. The radius distribution of the Kepler 
planet candidates is displayed as the thick black line. Top: the 
single-valued M-R with a = —1.8, ft = 0.0, and a 40% overall oc- 
currence rate, for which \ 2 = 1-2. Bottom: the multi-valued M-R 
with a = -1.0, = 0.0, f rock y(l) = 0.9, f r ocky(17) = 0.1, and a 
40% overall occurrence rate, for which \ 2 = 1-8- This \ 2 value is 
slighly higher due to the smaller error bars in the wings where the 
two data sets also happen to produce a poorer match. 



Table 1 

Total Number of Detectable Planets for the Best-fitting Parameters 
of the Multi- Valued M-R and a 40% HARPS Occurrence Rate 



ff o c k y ( 1 ) 


frocky \ 


a 


/S 


mult, 


x 2 


N det ect (40%) 


0.7 


0.2 


-1.1 


0.0 


0.11% 


2.0 


679 ± 27 


0.8 


0.1 


-1.1 


0.0 


0.11% 


1.7 


657 ± 26 


0.8 


0.1 


-1.0 


0.0 


0.11% 


1.9 


694 ± 27 


0.8 


0.2 


-1.0 


0.0 


0.16% 


2.0 


685 ± 27 


0.9 


0.1 


-1.1 


0.0 


0.11% 


1.6 


629 ± 27 


0.9 


0.1 


-1.0 


0.0 


0.18% 


1.8 


665 ± 28 


0.9 


0.2 


-1.0 


0.0 


0.17% 


1.9 


657 ± 28 


1.0 


0.0 


-1.0 


0.0 


0.14% 


1.6 


645 ± 27 


1.0 


0.1 


-1.0 


0.0 


0.18% 


1.7 


637 ± 27 


1.0 


0.1 


-0.9 


0.0 


0.21% 


2.0 


675 ± 29 


1.0 


0.2 


-1.0 


0.0 


0.12% 


2.0 


629 ± 27 



population-wide M-Rs could further improve the transit- 
RV fit, but it remains unclear whether these M-Rs would 
be parameterizable in a reasonable number of degrees of 
freedom. We therefore chose simplicity over absolute best 
fits to offer M-Rs that are both physically intuitive and 
computationally feasible. 

Figure [6] shows the total number of detectable simu- 
lated planets (Ndetect) averaged over 100 realizations of 
the simulated population for a 40% overall occurrence 
rate. The axes correspond to the two free parameters 
that characterize the multi-valued M-R in addition to a 
and /3: (1) the fraction of all 1 M® planets that have a 
rocky composition, f r0 cky(l), and (2) the fraction of all 
17 M® planets with a rocky composition, / rQ cfcy(17). As 
with Figure [2j the best fitting free parameters in Fig- 
ure [6] are outlined with white boxes; in this case, a good 
fit is identified by a median V mu it > 0.1% over all 100 
realizations of the simulated population and a reduced 
X 2 < 2.0 for the radius distributions analogous to those 
in Figure [5] The values of these statistics as well as the 
total numbers of detectable simulated planets are listed 
in Table Q] 

N de tect varies with / r0 cfcy(l) and / roo fe tf (17) and de- 
pends strongly on a and /?. Given that the total number 
of Kepler planet candidates in our filtered list is 631, the 
best-fit free parameters yield total numbers of detectable 
planets that are similar between the two data sets. Be- 
fore any conclusions can be drawn from this, however, we 
must account for the probability of false positives among 
Kepler's planet candidates. Fortunately, the 5 - 15% 
false alarm rate, which could lower the total number of 
actual planets in our filtered list to 535 in the worst- 
case scenario, falls within HARPS' ±10% error bars: be- 
cause Ndetect is directly proportional to the overall oc- 
currence rate, Kepler would have detected ~ 500 planets 
in its first four months of data if 30% of main sequence 
stars host at least one planet, with a = —1.0, /3 = 0.0, 
frocky(l) = 0.9, and frocky(17) = 0.1 parameterizing the 
population. Thus, our multi- valued M-R gives total num- 
bers of detectable simulated planets that are similar to 
Kepler's total number of planet candidates even when 
the false alarm probability is taken into account, indi- 
cating that the HARPS overall 30-50% statistic can in 
fact be consistent with Kepler's results. 
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Figure 6. The total number of detectabl e fi|3.4H simulated planets (-/V(j e ( ect ) with 2 < P < 50 days and 1 < R < 4 Rgj produced by the 
multi-valued mass-to-radius relationship ( $3.2.2$ for a 40% overall occurrence rate; this total number is averaged over 100 realizations. As 
a point of reference, the total number of Kepler planet candidates that we use for this comparison is 631. The axes denote the fraction of 
all 1 Mgj planets in the simulated planet population that have a rocky composition, / roc fc 1 ,(l), and the fraction of all 17 Mq planets that 
have a rocky composition, frocky(^-7); each panel corresponds to a different value of the mass power law index (a) at a constant period 
power law index of f) = 0.0. The white boxes outline the values for f roc k y (l) and f r0 cky{17) which produce both a median probability 
'Pmult > 0.1% over all N=100 realizations and a reduced \ 2 < 2.0 for the radius distributions analogous to those in Figure[5] 



5. DISCUSSION 

Because population-wide mass-to-radius relationships 
(M-R) are central to transit-RV comparison studies, they 
require realistic treatments of planet composition across 
the entire population. Ideally the requisite assumptions 
about the population's bulk density distribution would 
be informed by observations of planets that are detected 
by both methods. Unfortunately, however, current obser- 
vations of sub-Neptune-mass planets lack sufficient data 
to add ress this issue: Kepler- 11 b - f (jLissauer et al.l 
I2011aft . whose mass measurements have significant er- 
rors (± 30 - 100%), arc the only confirmed transit- 
ing planets that fall securely in the mass and period 
ranges considered in this paper (1 < M < 17 M ffl and 
2 < P < 50 days). A single- valued mass-to-radius rela- 
tionship derived these few dually-detected low-mass ex- 
oplancts f §3.2.ip furthermore provides a poor fit the Ke- 
pler planet candidates' period-radius distribution ( M4. ip . 
suggesting that the assumptions such a power law makes 
about the population's density distributions are incor- 
rect. 

Given that the first two dually-detected "Super-Earth" 
planets have similar masses but different bulk dens ities 
(CoRoT -7 b: iQueloz et al.l l200l [Leger et al.l [20091: GJ 
1214 b: iCharbonneau et all 12009). it is not surprising 
that a single-valued M-R produces poor agreement be- 
tween transit and RV survey results. Hints of a multi- 
valued M-R that allows planets with different densities to 
occur at the same mass have continued to emerge with 
more recent detections: most of the Ke pler- 11 planets 
have l ow bulk densities (0.5 - 3.1 g/cm 3 : iLissauer et al.l 
l2011a[) . while K epler-10 b and 55 C nc e yield densi- 
ties of 9 g/cm 3 (iBatalha et all 1201 If) and 5 - 6 g/cm 3 
(|Winn et alj 120111 : iDemorv et alj|201ll ). respectively. A 
popular explanation for this compositional bimodality is 
that the high-density planets, which so far are all ob- 
served on extremely close-in orbits (P < 2 days), con- 



stitute the special case of low-mass gas planets that 
have had their atmospheres com pletely stripped, leaving 
only their sol i d cor es behind (ISchaefer fc Fegfevl [200a 
Uackson et al.l 12010 : Batal haet al.l l201llh Instead, we 
propose that these high-density planets constitute a more 
general short-period — and thus more easily detectable 
— case of an entirely different class of exoplanets: true 
super-Earths that formed with a primarily refractory 
composition. This new interpretation has significant im- 
plica tions for planet formation (i.e. Hansen fc Murray! 
1201 If) , suggesting that there may be multip le modes of 
form ation for planets in this mass range (jLeger et al.l 
2011). To ascertain whether this is the case, however, we 
must break the degeneracy between the two interpreta- 
tions. The current state of observational data on individ- 
ual planets cannot accomplish this task, but a statistical 
transit-RV study such as the one conducted here can po- 
tentially elucidate the correct interpretation: considering 
transit surveys' bias toward larger and thus lower density 
planets, and RV surveys' bias towards more massive and 
thus higher density planets, statistical discrepancies be- 
tween the observed radius and period distributions could 
indicate that a complex population-wide M-R is at play. 

As a result, we believe that a multi- valued M-R is 
crucial for both explaining the apparent discrepancy be- 
tween the Kepler and HARPS occurrence rates and for 
developing a full understanding of the Galactic plane- 
tary population. The multi-valued M-R we present here 
( 33.2.2P adopts two compositions: rocky planets that fol- 
low the same R/R@ — (M/M^) 33 relationship as the 
Solar System's inner planets, and gaseous planets tha t 
follow the M-R curves presented in lRogers et alJ (|2011f ). 
while a prescription for atmospheric mass loss introduces 
a third intermediate composition. An admixture of these 
compositions over the entire 1 M e < M < 17 M ffi mass 
range is able to account for the density variation that 
exists among low-mass planets. We emphasize that the 
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order-of-magnitude mass loss prescription we appeal to 
here does not attempt to model the details of atmo- 
spheric escape; we use it only as a way to account for 
the evolution of a gaseous planet's radius in the low 
mass, large radius regime. Interestingly, the presence 
of intermediate-density planets in a period-radius pa- 
rameter space unoccupied by rocky or gaseous planets 
suggests that an intermediate-density planet population, 
however its constituent planets were formed, is another 
key component of the transit-RV comparison. 

For our multi-valued M-R we have placed particular 
emphasis on parameterizing the relative contributions 
from the rocky and gaseous compositions in as physically 
intuitive a way as possible, while taking care to minimize 
the number of free parameters. As a result, we adopt a 
parameterization that flows naturally from the coexis- 
tence of rocky super-Earths and gaseous sub-Neptunes 
at each planet mass and involves only two additional de- 
grees of freedom: (1) the fraction of all 1 M ffl planets in 
the simulated planet population that have a rocky com- 
position, frocky(l), and (2) the fraction of all 17 M® plan- 
ets that have a gaseous composition, 1 — f r0 cky{^7), with 
f rocky varying linearly between the bounding masses. 

Not only is a multi-valued M-R physically intuitive, 
but it also better fits Kepler's second-quarter planet 
candidates ( ij4.2|) . This improved fit enables us to ad- 
dress the question of planet occurrence rates, as the to- 
tal number of simulated planets that pass Kepler's de- 
tectability criterion (£ !3.4j) can only be attributable to the 
overall occurrence rate once the two distributions of de- 
tectable/detected planets are consistent with each other 
( iM ip . As a result, we find that HARPS' 40% occurrence 
rate is in fact consistent with Kepler's planet candidates 
for the range of best-fitting parameters in our simula- 
tions: a = -1.1 tO -0.9, = 0.0, frocky{l) = 0.7 to 1.0, 

and j ' r0 cky (17) = 0.0 to 0.2. The apparent discrepancy 
between the HARPS and Kepler occurrence rates there- 
fore can be naturally explained by the presence of dense 
planets in the HARPS data set — planets that, due to 
their relatively small radii, Kepler simply did not find 
after four months of data collection. 

Caution, of course, is in order. We have made a num- 
ber of assumptions in our simulations, the most strin- 
gent of which was restricting each host star to only one 
planet f ^3.ip . We accounted for this by only consider- 
ing the first planet candidate in our radius and period 
range to be listed by the Kepler pipeline in each multiple- 
planet system, which generates a ~ 20% overall reduc- 
tion of included planet candidates ( H3.5p . To be sure, a 
multiple-planet prescription could be applied to the sim- 
ulated planets, which would allow all of the Kepler planet 
candidates with 1 < R < 4 R® and 2 < P < 50 days 
to be included in the comparison. However, the HARPS 
occurrence rate offers no information about the appro- 
priate multiple-planet assumptions to make, and includ- 
ing such assumptions only muddies the clear implication 
of the seemingly discrepant RV and transit occurrence 
rates. 

Nonetheless, our use of only single-planet systems ne- 
cessitates close consideration of the meaning of an "oc- 
currence rate" as well as how these "occurrence rates" are 
calculated from stud y to study, a s point ed out by both 
IHoward et all pOll and lYoudinl ((2011 . The HARPS 
overall occurrence rate, which makes a statement about 



the fraction of stars with planets, treats the presence of 
planets around stars as a binary state: either the star 
hosts no planets, or it hosts one or more planets, making 
our single-planet assumption a very natural one to adopt. 
On the other hand, the occurrence ra tes computed by 
IHoward etall ()2011[ ) and lYoudinl (|2011l ) include the pos- 
sibility of multiple-planet systems and give the number of 
planets per star (NPPS), rather than the fraction of stars 
with planets (FSWP). With information about the dis- 
tri bution of multiple-pla net s ystems such as that offere d 
by lLatham etlrtl (|2011h and iTremaine fc Pond (|2011h . 
an NPPS occurrence rate can be directly compared to a 
FSWP occurrence rate. For ou r purposes we s imply note 
that the occur rence rates which IHoward et ail (|201lD and 
lYoudinl (|2011f ) compute (0.13 and 0.19 planets per star, 
respectively, for 2 < R < 4 R® and P < 50 days) would 
become even lower when transformed to a FSWP occur- 
rence rate, given the presence of multiple-planet systems; 
this only worsens the apparen t discrepancy b etween the 
two surveys' occurrence rates. lYoudinl (|2011l ) does point 
out, however, that if planets down to 0.5 R® are in- 
cluded, then this number-of-planets-per-star occurrence 
rate may be as high as 1.36. Thus, for the full 1 < R < 4 
R® range we consider in this paper, the apparent oc- 
currence rate discrepancy may also be explained at least 
in part by the slight differences in the considered radius 
range. 

Another potential source of concern is the difference 
between each survey's target star selection criteria. We 
address the biases produced from Kepler's selection cri- 
teria by drawing from the Q2 targets stars, and we 
account for its incompleteness by including the Q2 3- 
hour CDPP measurements; these considerations stem 
from how we frame the transit-RV comparison, as we 
ask how many short-period, low-mass planets Kepler 
would have detected in its first four months of data 
if the HARPS occurrence rate is true. However, to 
make a thorough comparison one also needs to con- 
sider how these biases differ from the RV selection cri- 
teria that factor into HARPS' overall occurrence rate. 
Both HARPS and Kepler preferentially c hoose G and 
K dwarfs with high s i gnal-to-noise ratios (jMavor et al.1 
12001 lUdrv et~aIll2000l : lBatalha et al.lf2010h . but HARPS 
also targets slowly rotating, magnetically quiet stars and 
includes no known spectroscopic binaries. Thus, the dif- 
ferences between the two survey's selection criteria lie in 
the presence of binary stars in the Kepler sample and in 
the distinction between RV stellar jitter and photometric 
noise. 

According to Batalha et al.l (|2010f ). Kepler searches for 
planets around all of the known eclipsing binaries (> 600) 
in its field of view. While these eclipsing binaries are 
not numerous enough by themselves to appreciably af- 
fect our statistics, the unidentified spectroscopic binaries 
in Kepler's field of view potentially are, if one reasonably 
allows for the possibility that the planet occurrence rate 
can differ between single stars and binary systems. To 
get a sense for the magnitu de of this effect, we refer to 
iDuquennov fc Mavorl {l991), who estimate that as many 
as two thirds of all G dwarfs have a stellar companion. 
The lognormal period distribution they find for spectro- 
scopic G-dwarf binaries indicates that roughly 8% of all 
G dwarfs exist in binaries separated by < 0.5 AU and 
~ 20% in binaries separated by < 10 AU; considering 
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that Kepler's false-positive v etting process enable s bina- 
ries at separations of < 1" (Batalh aet al.|[20Tll) to be 
identified, the relative fraction of tight binaries in the 
Kepler target star list could be even higher. Separations 
of < 0.5 AU and < 10 AU are especially of interest for the 
survival and formation of planets in binary systems, as 
the orbits of the planets considered in this paper would 
not be stable in equal-mass binary systems separated by 

< 0.5 AU, and protoplanetary disks around the primaries 
of < 10 AU binary systems would be truncated before the 
distance at which an ice line could form. Interestingly, a 
difference in the planet occurrence rate for binaries with 

< 10 AU separations versus those with > 10 AU sepa- 
rations could provide a way to discriminate between the 
compositions of these close-in planets, assuming that the 
terrestrial planets formed in-situ and the gaseous planets 
migrated in from wider orbits. 

The HARPS requirement that its target stars have low 
levels of RV stellar jitter is another potentially signifi- 
cant difference between the two surveys' target selection 
criteria. It is certainly the case that Kepler has prefer- 
entiall y chosen target stars that exhibit low photometric 
noise (Batalha et al.l 120101 ). but this noise is primarily 
correlated with the apparent magnitude of the star (i.e. 
Figure [T]) and does not necessarily reflect the degree of 
magnetic activity that heavily factors into the HARPS 
\og(R' HK ) < —4.8 target selection. If we temporarily ig- 
nore this, however, and assume that photometric noise is 
strongly correlated with stellar jitter, we can assess the 
effect of this selection criterion on our results. We find 
that limiting our potential host stars to the ~ 35,000 Ke- 
pler targets with CDPP3 < 150 ppm worsens the discrep- 
ancy between the Kepler and HARPS occurrence rates: 
for a = -1.0, = 0.0, f rocky {l) = 0.9, f rocky (17) = 0.1, 
and a 40% overall occurrence rate, we find that Kepler 
would have been able to detect 291 ± 19 planets in its 
first four months of data (^realizations = 100), while Ke- 
pler has actually found 217 planet candidates around 
stars with CDPP 3 < 150 ppm. A 30% HARPS occur- 
rence rate is needed to bring these numbers into agree- 
ment, making the HARPS- Kepler consistency marginal 
at best, although a high spectroscopic binary fraction in 
the Kepler sample could counteract this effect and im- 
prove the consistency between the occurrence rates. In 
any case, systematically accounting for the selection of 
quiet stars requires the forthco ming results of s tellar pho- 
tometric variation studies (i.e. IBasri et al.ll201ll ) to draw 
conclusions about the Kepler target stars' magnetic ac- 
tivity, given the absence of spectra for a majority of these 
targets. 

In short, we acknowledge that the differences in the 
two surveys' target star selection criteria could explain 
some of the apparent discrepancy between their occur- 
rence rates. Our intent here is simply to point out a 
plausible, testable explanation for an overall transit-RV 
occurrence rate discrepancy — the existence of a distri- 
bution of densities in a planet population — that does 
not depend on the selection criteria to produce similar 
numbers of observable planets. 

As a final note, we remark that significant errors in the 
Kepler target stars' radii could affect the best-fitting pa- 
rameters that we find for our multi-valued M-R. Based 
on the uncertainty in th e Kepler Input Ca talog's esti- 
mates of T e ff and log(p) (|Brown et al.ll20lij ). the stellar 



radii — and thus the radii of Kepler's planet candidates 
- are uncertain by tens of percent. These errors are 
substantial; however, the benefit of performing a statis- 
tical analysis such as the one presented here is that nor- 
mally distributed errors will tend to average out, given 
a large enough sample. Unfortunately, the errors on the 
KIC radii are not necessarily normally distributed, and 
in at least one instance they are known to be severely 
systematically biased in one direction: the presence of 
unidentified subgiant stars in the target stars list can 
un derestimate the ste llar radii by as much as a factor of 
2 ([Brown e t al. 2011). We have attempted to minimize 
the effect of such a severe systematic error by limiting the 
Kepler targ et stars we consider to only those with log(g) 
> 4.0 f ^3.3[) . but this does not guarantee that our sample 
of potential host stars are completely free of systematic 
biases that could change the best fits we calculate in our 
simulations. 

Fortunately, in this paper we are more concerned with 
the total number of detectable planets for its implications 
about the low-mass planet occurrence rates calculated by 
different planet detection techniques. Considering that 
the ±10% error bars in the HARPS overall occurrence 
rate can account for the variation in the total number 
of planets produced by changes in the multi-valued M- 
R's degrees of freedom ( H4.2j) as well as by a possible 
5 - 15% false positive rate amon g Kepler's planet can- 
didates ([Morton fc Johnson! 1201 ll) . we consider it likely 
that the HARPS and Kepler occurrence rates are ac- 
tually consistent with each other, with the implication 
being that there is a distribution of super-Earth/sub- 
Neptune densities at each planet mass. We have there- 
fore illustrated the importance of using a multi-valued 
M-R when comparing RV and transiting planet popula- 
tions, and we have shown that HARPS is likely detecting 
a large population of dense low-mass planets. 
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